Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3D Generation: InstantMesh Training, Inference, and Eval Scripts #681

Open
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

HaFred
Copy link
Collaborator

@HaFred HaFred commented Oct 1, 2024

This PR implements InstantMesh for 3D meshing using multiview images. The torch-cuda version with textures is shown below.

images

Input is nx3xhxw multiview video and the output is a 3D mesh mesh.obj file with vertices, faces.
HW environment specs are as follows.

Mindspore Version:               2.3.1 release
CANN Version:                    7.3
Ascend Driver:                   23.0.rc3.6

3D Mesh Demos

Inference demos with the pretrained checkpoint can be found in the readme, with any HTML renderer such as VSCode it can be viewed and interacted with smoothly as shown below.
image

These links can be found here for easy access.

Input multiview images are illustrated here, respectively. Kindly notice that the input multiview images can be either retrieved viav the SV3D pipeline or Zero123++ pipeline as in the original implementation, the paper's core contribution is the process of 3D meshing out of multiview images.

Limitations in Inference

1. Cache Miss in mint.unique() and other ACLNN Operators

As introduced in the Readme, InstantMesh extracts isosurface for meshing using FlexiCubes, which essentially requires a unique operation to determine the surfaces of an object. Unfortunately, the operator mint.unique() remains not supported by ACLNN but only AICPU, and hence leads to the program being stuck by cache misses, as told by the MindSpore framework colleagues. Once the mint.unique() is fully supported by A+M, we will implement the 3D meshing with FlexiCubes for higher resolution.

(Update on August 1st: Turns out the CANN operator aclnnUniqueDim takes too long. Interacting with the framework and CANN team for solutions.)

Workaround: 3D Meshing with the Raw Triplane Features using Marching Cubes

For the reasons above, we have to find a workaround to extract 3D meshes from multiview images (from SV3D in our case). Here in this PR, since we already have a rough SDF extracted from the SDF MLP heads with the triplane features input, a straightforward way is to take the rough SDF and feed it to the classic isosurface extraction method, such as the Marching Cubes. As the optimization in the Marching Cubes lacks degrees of freedom to represent high-quality meshes, it tends to use more vertices (naturally more faces, triangles) to fit an irregular 3D shape, especially when the 3D shape cannot be approximated to a surface. Details can be found in the FlexiCubes' paper.

FlexiCubes Marching Cubes
image image

2. CUDA Extension for Rasterization

InstantMesh uses nvdiffrast for uv map rasterization and 2D rendering for the FlexiCube 3D volume. Bypass this for now.

@HaFred HaFred requested a review from vigo999 as a code owner October 8, 2024 08:13

The illustrations here are better viewed in viewers than with HTML support (e.g., the vscode built-in viewer).

## Environments
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants